Nystrom Approximation for Sparse Kernel Methods: Theoretical Analysis and Empirical Evaluation

نویسندگان

  • Zenglin Xu
  • Rong Jin
  • Bin Shen
  • Shenghuo Zhu
چکیده

Nyström approximation is an effective approach to accelerate the computation of kernel matrices in many kernel methods. In this paper, we consider the Nyström approximation for sparse kernel methods. Instead of relying on the low-rank assumption of the original kernels, which sometimes does not hold in some applications, we take advantage of the restricted eigenvalue condition, which has been proved to be robust for sparse kernel methods. Based on the restricted eigenvalue condition, we have provided not only the approximation bound for the original kernel matrix but also the recovery bound for the sparse solutions of sparse kernel regression. In addition to the theoretical analysis, we also demonstrate the good performance of the Nyström approximation for sparse kernel regression on real world data sets. Introduction Kernel methods (Schölkopf and Smola 2002; Xu et al. 2009) have received a lot of attention in recent studies of machine learning. These methods project data into high-dimensional or even infinite-dimensional spaces via kernel mapping functions. Despite the strong generalization ability induced by kernel methods, they usually suffer from the high computation complexity of calculating the kernel matrix (also called Gram matrix). Although low-rank decomposition techniques(e.g., Cholesky Decomposition (Fine and Scheinberg 2002; Bach and Jordan 2005)), and truncating methods(e.g., Kernel Tapering (Shen, Xu, and Allebach 2014; Furrer, Genton, and Nychka 2006)) can accelerate the calculation of the kernel matrix, they still need to compute the kernel matrix. An effective approach to avoid the computation cost of computing the entire kernel matrix is to approximate the kernel matrix by the Nyström method (Williams and Seeger 2001), which provides low-rank approximation to the kernel matrix by sampling from its columns. The Nyström method has been proven useful in a number of applications, such as image processing (Fowlkes et al. 2004; Wang et al. 2009), which typically involve computations with large Copyright c © 2015, Association for the Advancement of Artificial Intelligence (www.aaai.org). All rights reserved. dense matrices. Recent research (Zhang, Tsang, and Kwok 2008; Farahat, Ghodsi, and Kamel 2011; Talwalkar and Rostamizadeh 2010; Kumar, Mohri, and Talwalkar 2012; Mackey, Talwalkar, and Jordan 2011; Gittens and Mahoney 2013) on the Nyström method have shown that the approximation error can be theoretically bounded. Jin et al. (2013) further shows that the approximation error bound can be improved from O(n/ √ m) to O(n/mp−1) (where n denotes the number of instances and m denotes the number of dimensions) when the eigenvalues of the kernel matrix satisfy a p-power law distribution. In this paper, we focus on finding the approximation bound of the Nyström method for sparse kernel methods. Although previous studies have demonstrated the good approximation bounds of the Nyström method for kernel methods, most of which are based on the assumption of the low rank of kernels (Jin et al. 2013). While if kernels are not low rank, Nyström approximations can usually lead to suboptimal performances. To alleviate the strong assumption in the seeking of the approximation bounds, we take a more general assumption that the design matrix K ensuring the restricted isometric property (Koltchinskii 2011). In particular, the new assumption obeys the restricted eigenvalue condition (Koltchinskii 2011; Bickel, Ritov, and Tsybakov 2009), which has been shown to be more general than several other similar assumptions used in sparsity literature (Candes and Tao 2007; Donoho, Elad, and Temlyakov 2006; Zhang and Huang 2008). Based on the restricted eigenvalue condition, we have provided error bounds for kernel approximation and recovery rate in sparse kernel regression. Thus we can accurately recover the sparse solution even with a modest number of random samples. It is important to note that the expected risk of the learning function will be small by exploiting the generalization error bound for data dependent hypothesis space (Shi, Feng, and Zhou 2011). To further evaluate the performance of the Nyström method for sparse kernel regression, we conduct experiments on both synthetic data and real-world data sets. Experimental results have indicated the huge acceleration of the Nyström method on training time while maintaining the same level of prediction errors. Proceedings of the Twenty-Ninth AAAI Conference on Artificial Intelligence

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Revisiting the Nystrom method for improved large-scale machine learning

We reconsider randomized algorithms for the low-rank approximation of symmetric positive semi-definite (SPSD) matrices such as Laplacian and kernel matrices that arise in data analysis and machine learning applications. Our main results consist of an empirical evaluation of the performance quality and running time of sampling and projection methods on a diverse suite of SPSD matrices. Our resul...

متن کامل

Fast DPP Sampling for Nystrom with Application to Kernel Methods

The Nyström method has long been popular for scaling up kernel methods. Its theoretical guarantees and empirical performance rely critically on the quality of the landmarks selected. We study landmark selection for Nyström using Determinantal Point Processes (DPPs), discrete probability models that allow tractable generation of diverse samples. We prove that landmarks selected via DPPs guarante...

متن کامل

Reduced Set KPCA for Improving the Training and Execution Speed of Kernel Machines

This paper presents a practical, and theoretically wellfounded, approach to improve the speed of kernel manifold learning algorithms relying on spectral decomposition. Utilizing recent insights in kernel smoothing and learning with integral operators, we propose Reduced Set KPCA (RSKPCA), which also suggests an easy-toimplement method to remove or replace samples with minimal effect on the empi...

متن کامل

Ensemble Nystrom Method

A crucial technique for scaling kernel methods to very large data sets reaching or exceeding millions of instances is based on low-rank approximation of kernel matrices. We introduce a new family of algorithms based on mixtures of Nyström approximations, ensemble Nyström algorithms, that yield more accurate low-rank approximations than the standard Nyström method. We give a detailed study of va...

متن کامل

Non-parametric Group Orthogonal Matching Pursuit for Sparse Learning with Multiple Kernels

We consider regularized risk minimization in a large dictionary of Reproducing kernel Hilbert Spaces (RKHSs) over which the target function has a sparse representation. This setting, commonly referred to as Sparse Multiple Kernel Learning (MKL), may be viewed as the non-parametric extension of group sparsity in linear models. While the two dominant algorithmic strands of sparse learning, namely...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015